The Kronos Incident
This Visual Analytics Assignment is based on the VAST Challenge 2021 Mini Challenge 2. The overview of the Challenge is about a Tethys-based GAStech that has been operating a natural gas production site in the island country og Kronos. Despite bringing in remarkable profits and developing strong relationships with the Kronos government, GAStech has not been as successful in demonstrating environmental stewardship.
In January 2014, while the leaders of GAStech were celebrating their successful initial public offering (IPO), several of the company’s employees go missing. It is suspected that an organisation called the Protectors of Kronos (POK) was involved in the employees’ disappearance, but things may not be what they seem.
Mini Challenge 2 is about analysing the movement and tracking data of the GAStech employees, as well as their card card transactions and loyalty card usage data. From which, any anomalies and suspicious behaviours of the GAStech employees will be identified and analysed.
Here.
The data sources used in the mini challenge are:
| Data sources | Description |
|---|---|
| car-assignments.csv | List of vehicle assignments by employee |
| Geospatial folder | ESRI shapefiles of Abila and Kronos |
| gps.csv | Vehicle tracking data |
| loyalty_data.csv | Loyalty card transaction data |
| cc_data.csv | Credit and debit card transaction data |
| MC2-Tourist.jpg | Tourist map of Abila with locations of interest identified |
Below are the list of R packages installed and used in this assignment.
packages = c('raster', 'sf', 'tmap', 'lubridate',
'tidyverse', 'igraph', 'tidygraph',
'ggraph', 'visNetwork', 'clock',
'DT', 'zoo', 'parcoords')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The credit card and loyalty card csv files are first loaded into R.
cc_data <- read_csv("./data/cc_data.csv")
loyalty_data <- read_csv("./data/loyalty_data.csv")
Taking a look at the two datasets, we can observe that the timestamp, location and loyaltynum are in character field while price and last4ccnum are in numerical field. It is to be noted that the timestamp field should be in date-time format.
glimpse(cc_data)
Rows: 1,490
Columns: 4
$ timestamp <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp <chr> "1/6/2014", "1/6/2014", "1/6/2014", "1/6/2014", "~
$ location <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
Therefore, we need to convert the timestamp field in both datasets from character type to date-time type.
cc_data$timestamp <- date_time_parse(cc_data$timestamp,
zone = "",
format = "%m/%d/%Y %H:%M")
loyalty_data$timestamp <- date_time_parse(loyalty_data$timestamp,
zone = "",
format = "%m/%d/%Y")
Now, all the fields are in their correct data type.
glimpse(cc_data)
Rows: 1,490
Columns: 4
$ timestamp <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
In order to have a better understanding and visualisation on how the credit card and loyalty card usage are related, we would need to join both datasets together.
First, it is noted that the timestamp column in the cc_data dataset has both dates and time, while the timestamp column in the loyalty_data dataset only has the date field. So, we need to extract out the date from the timestamp column of the cc_data.
cc_data$date <- format(cc_data$timestamp, "%m/%d/%Y")
cc_data$date <- date_time_parse(cc_data$date,
zone = "",
format = "%m/%d/%Y")
head(cc_data)
# A tibble: 6 x 5
timestamp location price last4ccnum date
<dttm> <chr> <dbl> <dbl> <dttm>
1 2014-01-06 07:28:00 Brew've Be~ 11.3 4795 2014-01-06 00:00:00
2 2014-01-06 07:34:00 Hallowed G~ 52.2 7108 2014-01-06 00:00:00
3 2014-01-06 07:35:00 Brew've Be~ 8.33 6816 2014-01-06 00:00:00
4 2014-01-06 07:36:00 Hallowed G~ 16.7 9617 2014-01-06 00:00:00
5 2014-01-06 07:37:00 Brew've Be~ 4.24 7384 2014-01-06 00:00:00
6 2014-01-06 07:38:00 Brew've Be~ 4.17 5368 2014-01-06 00:00:00
head(loyalty_data)
# A tibble: 6 x 4
timestamp location price loyaltynum
<dttm> <chr> <dbl> <chr>
1 2014-01-06 00:00:00 Brew've Been Served 4.17 L2247
2 2014-01-06 00:00:00 Brew've Been Served 9.6 L9406
3 2014-01-06 00:00:00 Hallowed Grounds 16.5 L8328
4 2014-01-06 00:00:00 Coffee Shack 11.5 L6417
5 2014-01-06 00:00:00 Hallowed Grounds 12.9 L1107
6 2014-01-06 00:00:00 Brew've Been Served 4.27 L4034
Now, we can join both datasets together by the date, price and location. A new column, hour, is added to take note of the time period the employee visited the location.
cc_loyalty_data <- left_join(cc_data, loyalty_data,
by = c("date" = "timestamp",
"location" = "location",
"price" = "price"))
cc_loyalty_data$hour = hour(cc_loyalty_data$timestamp)
glimpse(cc_loyalty_data)
Rows: 1,496
Columns: 7
$ timestamp <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ date <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ loyaltynum <chr> "L8566", NA, "L8148", "L5553", "L3800", "L2247", ~
$ hour <int> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7~
Exploratory data analysis is conducted to determine the most popular locations frequented by the employees of GAStech.
Looking at the combined dataset, we are able to observe that the top five most popular locations are:
freq_by_location <- cc_loyalty_data %>%
select(location, price) %>%
group_by(location) %>%
summarise(total_price = sum(price), freq = n()) %>%
arrange(desc(freq)) %>%
ungroup()
ggplot(data=freq_by_location,
aes(x = reorder(location, -freq), y = freq)) +
geom_col() +
theme(axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.5)) +
ggtitle("Frequency of Visit") +
labs(y = "Frequency", x = "Location")

Delving deeper into the top 5 most frequented places over the span of two weeks, we can plot a calendar heatmap to show the frequencies of purchases made at the top 5 most popular locations.
freq_by_date <- cc_loyalty_data %>%
select(date, hour, location, price) %>%
group_by(date, hour, location) %>%
summarise(total_price = sum(price), freq = n()) %>%
arrange(desc(freq)) %>%
ungroup()
freq_by_date$weekday = as.POSIXlt(freq_by_date$date, format = "%d/%m/%Y")$wday
freq_by_date$weekdayf <- factor(freq_by_date$weekday,
levels = rev(0:6),
labels = rev(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")),
ordered = TRUE)
top5_plot <- freq_by_date %>%
na.omit() %>%
filter(location %in% c("Katerina's Cafe", "Hippokampos", "Guy's Gyros", "Brew've Been Served", "Hallowed Grounds"))
top5_heatmap <- ggplot(top5_plot, aes(hour, weekdayf, fill = top5_plot$freq)) +
geom_tile(colour = "white") +
facet_grid(~location) +
theme(strip.text = element_text(size = 7)) +
scale_fill_distiller(palette = "Reds", direction = 1) +
xlab("Hour") +
xlim(c(0,23)) +
ylab("Day") +
ggtitle("Calendar Heatmap: Transaction Volume at Top 5 Locations") +
labs(fill = "Frequency")
top5_heatmap

The following observations and insights can be derived from the calendar heatmap:
Moving on, two other calendar heatmaps are plotted to view the transaction volume comparison between weekdays and weekends.
weekday_plot <- freq_by_date %>%
na.omit() %>%
filter(weekday %in% c(1:5))
weekday_heatmap <- ggplot(weekday_plot, aes(hour, location, fill = weekday_plot$freq)) +
geom_tile(colour = "white") +
theme(strip.text = element_text(size = 7)) +
scale_fill_distiller(palette = "Reds", direction = 1) +
xlab("Hour") +
xlim(c(0,23)) +
ylab("Location") +
ggtitle("Calendar Heatmap: Transaction Volume during Weekdays") +
labs(fill = "Frequency")
weekday_heatmap

weekend_plot <- freq_by_date %>%
na.omit() %>%
filter(weekday %in% c(0,6))
weekend_heatmap <- ggplot(weekend_plot, aes(hour, location, fill = weekend_plot$freq)) +
geom_tile(colour = "white") +
theme(strip.text = element_text(size = 7)) +
scale_fill_distiller(palette = "Reds", direction = 1) +
xlab("Hour") +
xlim(c(0,23)) +
ylab("Location") +
ggtitle("Calendar Heatmap: Transaction Volume during Weekends") +
labs(fill = "Frequency")
weekend_heatmap

A boxplot show the spread of the transaction amount made by GAStech employees at different locations.
median_price <- cc_loyalty_data %>%
group_by(location) %>%
summarise(median_price = median(price))
spending_by_location <- cc_loyalty_data %>%
left_join(median_price,
by = c("location"))
ggplot(data = spending_by_location,
aes(x = reorder(location, -median_price), y = price)) +
geom_boxplot(outlier.colour = "Red", outlier.fill = "Red") +
geom_point(alpha = 0) +
scale_y_log10() +
theme(axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.5)) +
ggtitle("Transaction Amount by Location") +
labs(y = "Spending amount", x = "Location")

Based on the above plots, we are able to spot a few anormalies that are note-worthy.
From both the weekday and weekend calendar maps plotted above, there are transactions occurring at that location at the wee hours, when most people would be asleep. Further analysis revealed that there are a total of 5 transactions occurring on 3 different days (12 Jan, 13 Jan and 19 Jan) around 0300-0400 time period. The 5 transactions are all carried out on different credit cards, with no loyalty card used. It is also suspicious that the last two transactions occurred within the span of three minutes, suggesting a likely possibility that the last two card owners might have seen each other in the mart.
knitr::kable(cc_loyalty_data %>%
filter(location %in% "Kronos Mart" & hour %in% 3) %>%
dplyr::select(location, timestamp, date, hour, price, last4ccnum, loyaltynum) %>%
arrange(hour), "simple",
caption = "Transactions made in Kronos Mart")
| location | timestamp | date | hour | price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|
| Kronos Mart | 2014-01-12 03:39:00 | 2014-01-12 | 3 | 277.26 | 8156 | NA |
| Kronos Mart | 2014-01-13 03:00:00 | 2014-01-13 | 3 | 147.30 | 5407 | NA |
| Kronos Mart | 2014-01-19 03:13:00 | 2014-01-19 | 3 | 87.66 | 3484 | NA |
| Kronos Mart | 2014-01-19 03:45:00 | 2014-01-19 | 3 | 194.51 | 9551 | NA |
| Kronos Mart | 2014-01-19 03:48:00 | 2014-01-19 | 3 | 150.36 | 8332 | NA |
Based on the boxplot plotted above, we are able to identify an outlier transaction at Frydos Autosupply n’ More. There is an exceptionally high transaction of 10,000 dollars at the store where the median price is 149.30 dollars. Similarly, this transaction was not tagged to any loyalty card.
knitr::kable(spending_by_location %>%
filter(location %in% "Frydos Autosupply n' More") %>%
filter(price %in% max(price)) %>%
dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
caption = "Transactions made in Frydos Autosupply n' More")
| location | timestamp | date | hour | price | median_price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|---|
| Frydos Autosupply n’ More | 2014-01-13 19:20:00 | 2014-01-13 | 19 | 10000 | 149.3 | 9551 | NA |
Despite being a low transaction amount, this purchase is rather suspicious as that is not tied to any loyalty card. The maximum, median and minimum transaction amounts also seem to be the same value, suggesting that it could be a singular purchase. This also means that not many people make purchases from Daily Dealz over the span of two weeks. Further analysis confirmed the suspicion. Furthermore, the credit card used to make the purchase is same as the credit card used to make the hefty purchase at Frydos Autosupply n’ More. The timing where the transaction occurred is also rather suspicious as it is an early morning purchase.
knitr::kable(spending_by_location %>%
filter(location %in% "Daily Dealz") %>%
dplyr::select(location, timestamp, date, hour, price, last4ccnum, loyaltynum) %>%
arrange(hour), "simple",
caption = "Transactions made in Daily Dealz")
| location | timestamp | date | hour | price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|
| Daily Dealz | 2014-01-13 06:04:00 | 2014-01-13 | 6 | 2.01 | 9551 | NA |
Based on the boxplot plotted above, we are able to identify an outlier transaction at Albert’s Fine Clothing. There is large transaction of 1,239.41 dollars compared to the median spending amount of 211.47 dollars.
knitr::kable(spending_by_location %>%
filter(location %in% "Albert's Fine Clothing") %>%
filter(price %in% max(price)) %>%
dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
caption = "Transactions made in Albert's Fine Clothing")
| location | timestamp | date | hour | price | median_price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|---|
| Albert’s Fine Clothing | 2014-01-17 19:44:00 | 2014-01-17 | 19 | 1239.41 | 211.47 | 1321 | L4149 |
Based on the boxplot plotted above, we are able to identify an outlier transaction at Chostus Hotel. There is a higher than average transaction of 600 dollars compared to the median spending amount of 114.22 dollars.
knitr::kable(spending_by_location %>%
filter(location %in% "Chostus Hotel") %>%
filter(price %in% max(price)) %>%
dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
caption = "Transactions made in Chostus Hotel")
| location | timestamp | date | hour | price | median_price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|---|
| Chostus Hotel | 2014-01-18 12:03:00 | 2014-01-18 | 12 | 600 | 114.22 | 5010 | L2459 |
This credit card is highly suspicious due to the fact that it is used in three of the alleged suspicious transactions pointed out above at Kronos Mart, Frydos Autosupply n’ More and Daily Dealz. Upon inspection of the card transaction detail, we can see that the card owner does have a loyalty card, but it seems like they tend to selectively use their loyalty card for discounts.
Another dubious point to note is that for the first week, they use their credit card daily but during the next week, there are gaps in the transactions. Especially after 13 Jan (Mon), the credit card user stopped using their card for a few days before making transactions on 16 Jan (Thu).
cc_loyalty_data$weekday = as.POSIXlt(cc_loyalty_data$date, format = "%d/%m/%Y")$wday
cc_loyalty_data$weekdayf <- factor(cc_loyalty_data$weekday,
levels = rev(0:6),
labels = rev(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")),
ordered = TRUE)
knitr::kable(cc_loyalty_data %>%
filter(last4ccnum %in% 9551) %>%
dplyr::select(timestamp, weekdayf, location, price, last4ccnum, loyaltynum),
caption = "Transactions made with CC 9551")
| timestamp | weekdayf | location | price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|
| 2014-01-06 07:55:00 | Mon | Hallowed Grounds | 8.05 | 9551 | L5777 |
| 2014-01-06 13:21:00 | Mon | Hippokampos | 28.23 | 9551 | NA |
| 2014-01-06 20:26:00 | Mon | Albert’s Fine Clothing | 276.90 | 9551 | L5777 |
| 2014-01-07 07:46:00 | Tue | Hallowed Grounds | 84.44 | 9551 | NA |
| 2014-01-07 13:37:00 | Tue | Gelatogalore | 21.52 | 9551 | L5777 |
| 2014-01-08 07:56:00 | Wed | Hallowed Grounds | 12.86 | 9551 | NA |
| 2014-01-08 13:43:00 | Wed | Hippokampos | 39.80 | 9551 | L5777 |
| 2014-01-08 21:16:00 | Wed | Ouzeri Elian | 30.81 | 9551 | NA |
| 2014-01-09 07:50:00 | Thu | Hallowed Grounds | 34.45 | 9551 | NA |
| 2014-01-09 13:41:00 | Thu | Abila Zacharo | 89.41 | 9551 | NA |
| 2014-01-10 13:16:00 | Fri | Ouzeri Elian | 30.71 | 9551 | NA |
| 2014-01-11 13:37:00 | Sat | Hippokampos | 75.62 | 9551 | NA |
| 2014-01-11 19:44:00 | Sat | Shoppers’ Delight | 149.20 | 9551 | NA |
| 2014-01-12 14:06:00 | Sun | Hippokampos | 71.99 | 9551 | NA |
| 2014-01-13 06:04:00 | Mon | Daily Dealz | 2.01 | 9551 | NA |
| 2014-01-13 13:18:00 | Mon | U-Pump | 55.25 | 9551 | NA |
| 2014-01-13 13:28:00 | Mon | Hippokampos | 30.51 | 9551 | L5777 |
| 2014-01-13 19:20:00 | Mon | Frydos Autosupply n’ More | 10000.00 | 9551 | NA |
| 2014-01-13 19:30:00 | Mon | Ouzeri Elian | 28.75 | 9551 | L5777 |
| 2014-01-16 08:05:00 | Thu | Hallowed Grounds | 12.19 | 9551 | L5777 |
| 2014-01-16 13:28:00 | Thu | Guy’s Gyros | 10.27 | 9551 | L5777 |
| 2014-01-16 20:28:00 | Thu | Ouzeri Elian | 9.91 | 9551 | L5777 |
| 2014-01-17 08:04:00 | Fri | Hallowed Grounds | 9.40 | 9551 | L5777 |
| 2014-01-17 20:28:00 | Fri | Ouzeri Elian | 35.81 | 9551 | L5777 |
| 2014-01-18 13:32:00 | Sat | Abila Zacharo | 16.59 | 9551 | L5777 |
| 2014-01-18 19:26:00 | Sat | Ouzeri Elian | 61.56 | 9551 | NA |
| 2014-01-19 03:45:00 | Sun | Kronos Mart | 194.51 | 9551 | NA |
| 2014-01-19 19:49:00 | Sun | Ouzeri Elian | 32.77 | 9551 | NA |
Moving on, we will be analysing the vehicle data alongside with the credit and loyalty card transaction data.
First, the raster file, M2-tourist.tif, is imported into R.
bgmap <- raster("./data/MC2-tourist.tif")
Next, the vector GIS data file, Abila, is imported into R.
Abila_st <- st_read(dsn = "./data/Geospatial",
layer = 'Abila')
Reading layer `Abila' from data source
`C:\Users\conni\Documents\ISSS608_Visual_Analytics&Applications\connieyjx\DataViz_blog\_posts\2021-07-25-assignment-1\data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS: WGS 84
The vehicle tracking csv data file is also imported into R.
gps <- read_csv("./data/gps.csv")
glimpse(gps)
Rows: 685,169
Columns: 4
$ Timestamp <chr> "1/6/2014 6:28", "1/6/2014 6:28", "1/6/2014 6:28",~
$ id <dbl> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ lat <dbl> 36.07623, 36.07622, 36.07621, 36.07622, 36.07621, ~
$ long <dbl> 24.87469, 24.87460, 24.87444, 24.87425, 24.87417, ~
After taking a look at the gps.csv file, we converted the Timestamp field from character data type to date-time format. The id field is also converted from numerical data type to factor data type.
The gps dataframe is converted into a simple feature data frame.
gps_sf <- st_as_sf(gps,
coords = c("long", "lat"),
crs = 4326)
glimpse(gps_sf)
Rows: 685,169
Columns: 4
$ Timestamp <dttm> 2014-01-06 06:28:00, 2014-01-06 06:28:00, 2014-01~
$ id <fct> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ day <fct> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ geometry <POINT [°]> POINT (24.87469 36.07623), POINT (24.8746 36~
The gps points are then joined together into movement paths by using the drivers’ IDs as unique identifier.
gps_path <- gps_sf %>%
group_by(id, day) %>%
summarise(m = mean(Timestamp),
do_union = FALSE) %>%
st_cast("LINESTRING")
glimpse(gps_path)
Rows: 508
Columns: 4
Groups: id [40]
$ id <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, ~
$ day <fct> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,~
$ m <dttm> 2014-01-06 15:01:40, 2014-01-07 12:40:38, 2014-01-~
$ geometry <LINESTRING [°]> LINESTRING (24.88258 36.066..., LINESTRI~
The stationary points of all cars over the span of 2 weeks is plotted on the map below with the blue dots. The car is deeemd to be stationary if it has stopped for more than 3 minutes.
POI <- gps_sf %>%
group_by(id) %>%
mutate(stoptime = Timestamp - lag(Timestamp)) %>%
mutate(parked = ifelse(stoptime > 60*3, TRUE, FALSE)) %>%
ungroup() %>%
filter(parked == TRUE) %>%
group_by(id, day) %>%
add_count(id, day, name = "count") %>%
ungroup()
POI_sf <- POI %>%
filter(parked == TRUE)
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(POI_sf) +
tm_dots(col = "blue")
Going back to the anormalies assessment, we can use the vehicle tracking data to supplement our analysis.
The credit card used to make that purchase ends with 9551. Hence, we can pull out all the transactions made on that card on 13 Jan.
knitr::kable(cc_loyalty_data %>%
filter(last4ccnum == 9551 & date == dmy(13012014)) %>%
dplyr::select(timestamp, location, price, last4ccnum, loyaltynum) %>%
arrange(timestamp), "simple",
caption = "Transactions made using CC 9551 on 13 Jan")
| timestamp | location | price | last4ccnum | loyaltynum |
|---|---|---|---|---|
| 2014-01-13 06:04:00 | Daily Dealz | 2.01 | 9551 | NA |
| 2014-01-13 13:18:00 | U-Pump | 55.25 | 9551 | NA |
| 2014-01-13 13:28:00 | Hippokampos | 30.51 | 9551 | L5777 |
| 2014-01-13 19:20:00 | Frydos Autosupply n’ More | 10000.00 | 9551 | NA |
| 2014-01-13 19:30:00 | Ouzeri Elian | 28.75 | 9551 | L5777 |
On 13 Jan, CC 9551 was used at Daily Dealz, U-Pump, Hippokampos, Frydos and Ouzeri Elian.
The gps path of all vehicles on 13 Jan is plotted for reference. Looking at the map, we can observe that on 13 Jan, only Car ID 24 made a stop at U-Pump. Hence, it is worth investigating is the owner of the Car ID 24 is the same owner of CC 9551.
gps_path_2 <- gps_path %>%
filter(day==13)
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_path_2) +
tm_lines()
The stationary points of Car ID 24 on 13 Jan are as follows:
knitr::kable(POI_sf %>%
filter(id == 24 & day == 13) %>%
dplyr::select(id, Timestamp, geometry) %>%
arrange(Timestamp), "simple",
caption = "GPS Path of Car ID 24 on 13 Jan")
| id | Timestamp | geometry |
|---|---|---|
| 24 | 2014-01-13 07:32:00 | POINT (24.89881 36.06246) |
| 24 | 2014-01-13 08:07:00 | POINT (24.90124 36.05406) |
| 24 | 2014-01-13 11:16:00 | POINT (24.87958 36.04803) |
| 24 | 2014-01-13 11:46:00 | POINT (24.85761 36.07666) |
| 24 | 2014-01-13 12:31:00 | POINT (24.85757 36.07669) |
| 24 | 2014-01-13 13:22:00 | POINT (24.87149 36.06777) |
| 24 | 2014-01-13 17:57:00 | POINT (24.87958 36.04803) |
| 24 | 2014-01-13 19:29:00 | POINT (24.90178 36.05493) |
gps_24 <- gps_path %>%
filter (day == 13 & id == 24)
POI_24 <- POI_sf %>%
filter(id == 24 & day == 13)
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_24) +
tm_lines() +
tm_shape(POI_24) +
tm_dots(col = "blue")
Observations:
Conclusion: The user of Car ID 24 and the holder of CC 9551 might not be the same person as the card and car records do not match perfectly, making the transactions on CC 9551 very suspicious. We can infer that perhaps the CC 9951 was used by more than one person.
These five different transactions are all made by different credit cards, suggesting that it might be not a habit of an individual who prefers to grocery shop at night.
knitr::kable(cc_loyalty_data %>%
filter(location == "Kronos Mart" & hour == "3") %>%
dplyr::select(timestamp, location, price, last4ccnum, loyaltynum) %>%
arrange(timestamp), "simple",
caption = "Transactions made at Kronos Mart during the wee hours")
| timestamp | location | price | last4ccnum | loyaltynum |
|---|---|---|---|---|
| 2014-01-12 03:39:00 | Kronos Mart | 277.26 | 8156 | NA |
| 2014-01-13 03:00:00 | Kronos Mart | 147.30 | 5407 | NA |
| 2014-01-19 03:13:00 | Kronos Mart | 87.66 | 3484 | NA |
| 2014-01-19 03:45:00 | Kronos Mart | 194.51 | 9551 | NA |
| 2014-01-19 03:48:00 | Kronos Mart | 150.36 | 8332 | NA |
Looking at the stationary points of the cars on 12, 13 and 19 Jan, we aren’t able to point out which car is linked to the credit cards used at the mart. The only stationary points appearing near Kronos Mart are:
These timings all do not coincide with the timings in the wee hours.
Conclusion: The credit card holders of 8156, 5407, 3484, 9551 and 8332 either stay a walking distance from the Kronos Mart or they used their own personal cars instead of the company-provide cars to make the travel.
Looking at the Albert’s Fine Clothing suspicious transaction, we can see that the transaction occurred on 17 Jan at 19:44, using CC 1321 and L4149.
knitr::kable(spending_by_location %>%
filter(location %in% "Albert's Fine Clothing") %>%
filter(price %in% max(price)) %>%
dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
caption = "Transactions made in Albert's Fine Clothing")
| location | timestamp | date | hour | price | median_price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|---|
| Albert’s Fine Clothing | 2014-01-17 19:44:00 | 2014-01-17 | 19 | 1239.41 | 211.47 | 1321 | L4149 |
Hence, when we plot out the stationary locations on the map (on 17 Jan, between 1900-2000), we are able to observe that Car ID 11 left the clothing store at 19:46, which matched up to the suspicious transaction time. Hence, it can be implied that the owner of Car ID 11 might hold CC 1321 and L4149.
POI_sf$hour = hour(POI_sf$Timestamp)
POI_clothing <- POI_sf %>%
filter(day == 17 & hour == 19)
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(POI_clothing) +
tm_dots(col = "blue")
Similarly, we can see that the dubious transaction at Chostus Hotel occurred on 18 Jan at 12:03. The credit card in question is 5010, tied to a loyalty card L2459.
knitr::kable(spending_by_location %>%
filter(location %in% "Chostus Hotel") %>%
filter(price %in% max(price)) %>%
dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
caption = "Transactions made in Chostus Hotel")
| location | timestamp | date | hour | price | median_price | last4ccnum | loyaltynum |
|---|---|---|---|---|---|---|---|
| Chostus Hotel | 2014-01-18 12:03:00 | 2014-01-18 | 12 | 600 | 114.22 | 5010 | L2459 |
Plotting the stationary points on the map, we can see that Car ID 31 left the hotel at 12:35, which matches up with the transaction data. Hence, it is implied that the owner of Car ID 31 might be the holder of CC 5010.
POI_hotel <- POI_sf %>%
filter(day == 18 & hour == 12)
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(POI_hotel) +
tm_dots(col = "blue")
Moving on, we will be trying to see if we can derive the respective owners of the credit and loyalty cards.
The unique counts of the credit cards and loyalty cards are 55 and 54 respectively. This suggests that there might be duplicate usages of either credit or loyalty cards.
[1] 55
[1] 54
First, we will be joining the two datasets by the day, location and price.
cc_data$day <- as.factor(get_day(cc_data$timestamp))
loyalty_data$day <- as.factor(get_day(loyalty_data$timestamp))
ccloy <- merge(cc_data, loyalty_data,
by = c("day", "location", "price")) %>%
select("timestamp.y", "timestamp.x", "day", "location", "price", "last4ccnum", "loyaltynum")
glimpse(ccloy)
Rows: 1,087
Columns: 7
$ timestamp.y <dttm> 2014-01-10, 2014-01-10, 2014-01-10, 2014-01-10,~
$ timestamp.x <dttm> 2014-01-10 14:01:00, 2014-01-10 13:46:00, 2014-~
$ day <fct> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, ~
$ location <chr> "Abila Zacharo", "Abila Zacharo", "Abila Zacharo~
$ price <dbl> 16.12, 31.34, 36.38, 36.73, 8.22, 9.32, 126.13, ~
$ last4ccnum <dbl> 3492, 2681, 7819, 1310, 1874, 6691, 6895, 1877, ~
$ loyaltynum <chr> "L7814", "L1107", "L5259", "L8012", "L4424", "L6~
Then, we pulled out all of the unique tags to each credit and loyalty card. There is a total of 62 entries, which means that there might be duplicate usage of either credit or loyalty cards.
unique_tag <- ccloy %>%
select("last4ccnum", "loyaltynum") %>%
distinct(last4ccnum, loyaltynum, .keep_all = TRUE) %>%
ungroup()
unique_tag$last4ccnum = as.factor(unique_tag$last4ccnum)
DT::datatable(unique_tag,
options = list(
autoWidth = FALSE,
columnDefs = list(list(width = '1px',
className = 'dt-center',
targets = c(0,1,2))))) %>%
formatStyle(0,
target = 'row',
lineHeight='75%')
We also plotted an interactive parallel coordinates plot to see which credit cards and loyalty cards are linked together.
parcoords(
unique_tag[,1:2],
rownames = FALSE,
reorderable = T,
brushMode = '1D-axes')
Taking at a look at the duplicates, we can see that there are a total of 7 credit cards that are linked to 2 different loyalty cards each.
cc_duplicates = unique_tag[duplicated(unique_tag$last4ccnum)|duplicated(unique_tag$last4ccnum, fromLast=TRUE),] %>%
arrange(last4ccnum)
DT::datatable(cc_duplicates,
options = list(
autoWidth = FALSE,
columnDefs = list(list(width = '1px',
className = 'dt-center',
targets = c(0,1,2))))) %>%
formatStyle(0,
target = 'row',
lineHeight='75%')
Similarly, for the loyalty cards, there are a total of 8 loyalty cards that are used for 2 different credit card holders.
loyalty_dups = unique_tag[duplicated(unique_tag$loyaltynum)|duplicated(unique_tag$loyaltynum, fromLast=TRUE),] %>%
arrange(loyaltynum)
loyalty_dups <- loyalty_dups[, c("loyaltynum", "last4ccnum")]
DT::datatable(loyalty_dups,
options = list(
autoWidth = FALSE,
columnDefs = list(list(width = '1px',
className = 'dt-center',
targets = c(0,1,2))))) %>%
formatStyle(0,
target = 'row',
lineHeight='75%')
Yet there are uncertainties in the data. The credit card data is timestamped with both date and time while the loyalty card data is timestamped with only dates. This would mean that some matching of data might not be accurate, if there are more than one transactions of the same amount in the same day but at different timings, yet tagged to different credit and loyalty cards. Under this circumstance, it would make it difficult for us to determine which credit card is linked to which loyalty card.